241 research outputs found
A Deep Generative Model of Vowel Formant Typology
What makes some types of languages more probable than others? For instance,
we know that almost all spoken languages contain the vowel phoneme /i/; why
should that be? The field of linguistic typology seeks to answer these
questions and, thereby, divine the mechanisms that underlie human language. In
our work, we tackle the problem of vowel system typology, i.e., we propose a
generative probability model of which vowels a language contains. In contrast
to previous work, we work directly with the acoustic information -- the first
two formant values -- rather than modeling discrete sets of phonemic symbols
(IPA). We develop a novel generative probability model and report results based
on a corpus of 233 languages.Comment: NAACL 201
One-Shot Neural Cross-Lingual Transfer for Paradigm Completion
We present a novel cross-lingual transfer method for paradigm completion, the
task of mapping a lemma to its inflected forms, using a neural encoder-decoder
model, the state of the art for the monolingual task. We use labeled data from
a high-resource language to increase performance on a low-resource language. In
experiments on 21 language pairs from four different language families, we
obtain up to 58% higher accuracy than without transfer and show that even
zero-shot and one-shot learning are possible. We further find that the degree
of language relatedness strongly influences the ability to transfer
morphological knowledge.Comment: Accepted at ACL 201
Context-Aware Prediction of Derivational Word-forms
Derivational morphology is a fundamental and complex characteristic of
language. In this paper we propose the new task of predicting the derivational
form of a given base-form lemma that is appropriate for a given context. We
present an encoder--decoder style neural network to produce a derived form
character-by-character, based on its corresponding character-level
representation of the base form and the context. We demonstrate that our model
is able to generate valid context-sensitive derivations from known base forms,
but is less accurate under a lexicon agnostic setting
A Fast Algorithm for Computing Prefix Probabilities
Multiple algorithms are known for efficiently calculating the prefix
probability of a string under a probabilistic context-free grammar (PCFG). Good
algorithms for the problem have a runtime cubic in the length of the input
string. However, some proposed algorithms are suboptimal with respect to the
size of the grammar. This paper proposes a novel speed-up of Jelinek and
Lafferty's (1991) algorithm, which runs in , where is the input length and is the
number of non-terminals in the grammar. In contrast, our speed-up runs in
.Comment: To be published in the Proceedings of ACL 202
- …